Chapter 3 — Diffusion Modeling with Pupil-Linked Arousal (Response-Signal Design)

Author

Mohammad Dastgheib

Abstract

This chapter presents a hierarchical Wiener diffusion decision model (DDM) for a response-signal change-detection task in older adults. The primary model maps task difficulty to drift rate (v), boundary separation (a), and starting-point bias (z), with small condition effects on non-decision time (t₀). We report comprehensive quality assurance checks, manipulation checks independent of the DDM, model comparison via LOO cross-validation, and extensive posterior predictive checks with emphasis on subject-wise mid-body RT quantiles.

1 Introduction

1.2 Insights from Drift-Diffusion Modeling of Aging

To disentangle these component processes of decision-making, researchers increasingly turn to computational modeling approaches. One powerful framework is the Drift Diffusion Model (DDM), which provides a quantitative decomposition of choice reaction time data into psychologically interpretable parameters (Ratcliff, 1978; Ratcliff & McKoon, 2008; Voss et al., 2004). In a forced-choice decision task, the DDM conceptualizes the process as a gradual, noisy accumulation of evidence toward one of two decision boundaries (representing the response options). The key model parameters include:

  • Drift rate (\(v\)) – the average speed of evidence accumulation toward the correct decision. This reflects the quality or efficiency of information processing; a higher drift rate means the decision maker can extract and use task-relevant information more quickly.
  • Decision boundary (\(a\)) – the amount of evidence required to commit to a choice, often interpreted as response caution or threshold. A larger boundary separation indicates a more cautious strategy (waiting for more evidence before deciding), whereas a smaller boundary implies a more impulsive or speed-emphasizing strategy.
  • Starting point (\(z\)) – the initial bias or predisposition toward one option or the other before evidence is accumulated. If the starting point is centered (typically 0.5 in relative units), there is no pre-existing bias; deviations from center indicate an a priori bias to favor a particular response (e.g., a bias to say “yes” vs “no” in a detection task).
  • Non-decision time (\(t_0\)) – the duration of processes outside of the evidence accumulation itself, such as perceptual encoding of the stimulus and the motor execution of the response (Voss et al., 2004). Non-decision time accounts for aspects of the reaction time that are not decision-related.
Figure 1: Drift Diffusion Model: Evidence Accumulation Process. The DDM models decision-making as a noisy evidence accumulation process. Evidence accumulates from a starting point (z, starting-point bias) toward one of two decision boundaries: the upper boundary corresponds to “different” responses, and the lower boundary to “same” responses. The rate of evidence accumulation is determined by the drift rate (v), which reflects the quality of the stimulus evidence. The distance between boundaries is the boundary separation (a), which reflects response caution. The total reaction time (RT) is the sum of non-decision time (t₀, reflecting perceptual encoding and motor execution) and the decision time (time to reach a boundary). In this example, the starting point is biased toward the upper boundary (z > 0.5), and positive drift leads to a “different” response.

Formally, the evidence accumulation process follows the stochastic differential equation (Ratcliff, 1978; Ratcliff & McKoon, 2008):

\[dX(t) = v \, dt + dW(t) \tag{1}\]

where \(X(t)\) is the accumulated evidence at time \(t\), \(v\) is the drift rate (evidence strength), and \(dW(t)\) is a Wiener noise process with unit variance. The process starts at \(X(0) = z \cdot a\), where \(z \in [0,1]\) is the starting-point bias (expressed as a proportion of boundary separation) and \(a > 0\) is the boundary separation (Ratcliff & McKoon, 2008). The decision process terminates when \(X(t)\) reaches either the upper boundary (\(X(t) = a\)) or the lower boundary (\(X(t) = 0\)). The total reaction time (RT) is the sum of the decision time and the non-decision time (\(t_0\)):

\[\text{RT} = t_{\text{decision}} + t_0 \tag{2}\]

By fitting the DDM to participants’ accuracy and response time distributions, one can infer how these latent parameters differ between groups (e.g. young vs. old) or conditions. This model-based approach has proved especially illuminating in aging research. Rather than relying on overall slowing measures alone, the DDM allows researchers to pinpoint which aspects of processing slow down or change with age and which remain intact.

The DDM mathematically formalizes the behavioral patterns described above. The model confirms that older adults’ slower responses are largely due to shifts in caution and peripheral processing, rather than uniformly impaired evidence accumulation. Specifically, older adults consistently exhibit significantly higher decision thresholds (\(a\)) than younger adults across a variety of tasks (Ratcliff et al., 2004; Ratcliff & McKoon, 2008), quantitatively capturing the strategic slowing: by widening the distance between boundaries, older adults counteract internal noise and maintain high accuracy. Additionally, the model isolates the contribution of peripheral slowing; older adults typically show increased non-decision times (\(t_0\)), often 80–100 ms longer than young adults (Ratcliff et al., 2004), reflecting age-related delays in motor execution and sensory encoding. Crucially, the DDM reveals that drift rate (\(v\))—the core measure of cognitive processing efficiency—is often remarkably preserved in aging for simple perceptual tasks (Ratcliff et al., 2001, 2003, 2004).

However, as noted previously, this preservation is not universal: in tasks taxing memory retrieval (Spaniol et al., 2006) or complex visual search (Madden & Allen, 1991), modeling confirms a decline in drift rates, indicating that for complex cognitive operations, the older brain does accumulate evidence more slowly. Furthermore, while older adults can adjust their boundaries, they often exhibit a rigidity in this setting; Starns and Ratcliff (2010) demonstrated that older adults fail to lower their boundaries (\(a\)) optimally under speed pressure, prioritizing accuracy even when the task incentivizes speed. Regarding bias (\(z\)), healthy aging is generally not associated with systematic shifts in starting point for simple tasks, though specific biases can emerge in memory paradigms (e.g., a conservative bias against “new” items to avoid false alarms) (Ratcliff et al., 2004; Spaniol et al., 2006). Overall, the application of drift-diffusion modeling provides a nuanced portrait of aging: it mathematically separates the strategic adaptations (increased \(a\)) and peripheral slowing (increased \(t_0\)) from the fundamental cognitive capacity (drift rate \(v\)), which remains intact in simple contexts but declines under high cognitive load.

Having established the baseline DDM profile of older adults, we turn to an important modulating factor: arousal and effort. In this chapter, we leverage the diffusion model to investigate how fluctuations in arousal (induced via physical effort) can alter these latent decision processes in older adults.

1.3 Arousal, Effort, and Decision Performance in Older Adults

Beyond baseline aging effects, cognitive performance is strongly influenced by the organism’s arousal state—the level of alertness or activation of physiological and neural systems. Classic theory, dating back to the Yerkes–Dodson law (Yerkes & Dodson, 1908), holds that the relationship between arousal and performance follows an inverted-U function: increasing arousal enhances performance up to an optimal point, after which further arousal (especially if reaching stress or anxiety levels) impairs performance. In the context of aging, this dynamic takes on special significance. Adaptive Gain Theory (AGT) (Aston-Jones & Cohen, 2005) provides a neural mechanism for this relationship, linking phasic and tonic Locus Coeruleus activity to optimal task performance. When this framework is extended to aging, researchers posit that the arousal–performance curve is altered, often manifesting as a leftward shift or compression of the inverted-U function (Mather & Harley, 2016; Mikneviciute et al., 2022). This implies that older adults may reach their “optimal” arousal peak at lower levels of objective demand than younger adults. Consequently, levels of effort or stress that might be engaging or beneficial for a younger adult (placing them at the peak of the curve) can push an older adult onto the “descending limb,” leading to supra-optimal arousal and performance decrements (Huang & Clewett, 2024; Mather & Harley, 2016).

Older adults typically have a reduced physiological capacity to sustain high arousal yet often need to exert greater mental effort to perform a given task at the same level as a younger person. Recent studies support the idea that effortful engagement is more taxing for older adults in measurable ways. For example, Hess and Ennis (Hess & Ennis, 2012) demonstrated that when older adults performed continuous arithmetic tasks (e.g., subtraction), they exhibited significantly larger increases in systolic blood pressure (SBP)—a physiological index of effortful arousal—than young adults, and this elevated physiological cost predicted greater fatigue on subsequent tasks. Furthermore, research directly relevant to the current paradigm has shown that concurrent physical effort can be detrimental to cognition in aging. Azer et al. (Azer et al., 2023) found that while maintaining a concurrent moderate isometric handgrip (30% MVC), older adults showed significantly reduced accuracy in a visual working memory task with distractors, whereas younger adults remained unaffected. This supports the limited-capacity framework, suggesting that shared processing resources (Wickens, 2008) are more easily depleted in older adults, or that the combined demand drives arousal into a dysregulated state (Verhaeghen et al., 2003). What are the expected effects of arousal fluctuations on the decision-making mechanisms of older adults? By applying the drift-diffusion model (DDM), we can make specific hypotheses about how effort-induced arousal will modulate latent decision parameters. The present study manipulates arousal via physical effort (5% vs. 40% MVC), providing a controlled way to “push” older participants along the arousal curve. Behavioral theories such as Resource Competition and Limited Capacity predict that excessive effort will siphon processing resources away from the decision process, degrading evidence quality, whereas Adaptive Gain Theory predicts that older adults will more quickly slide down the descending limb of the inverted-U curve when demands are high. These frameworks collectively motivate the computational predictions outlined after we review the relevant physiological mechanisms.

1.4 Arousal Dynamics and the Locus Coeruleus-Norepinephrine System

Beyond the structural decision parameters captured by the DDM, cognitive performance is dynamically modulated by the brain’s arousal state. A central regulator of this arousal is the Locus Coeruleus-Norepinephrine (LC-NE) system, a small brainstem nucleus that serves as the primary source of norepinephrine to the forebrain (Aston-Jones & Cohen, 2005). The LC-NE system modulates the “neural gain” of cortical circuits—essentially the signal-to-noise ratio of information processing. According to Adaptive Gain Theory (AGT), optimal performance relies on a balance between two modes of LC activity: a moderate tonic (baseline) firing rate that promotes focused attention, and robust phasic (event-related) bursts that facilitate rapid behavioral responses to task-relevant stimuli (Aston-Jones & Cohen, 2005; Gilzenrat et al., 2010).

In the context of aging, this system undergoes significant changes. While structural degradation of the LC is common in older adults (Mather & Harley, 2016), functional compensatory mechanisms often emerge. Older adults may exhibit chronically elevated tonic arousal or hyper-responsivity to challenge, potentially as a strategy to offset neural inefficiency (Lee et al., 2018; Mather et al., 2016). However, this compensation has limits; Adaptive Gain Theory suggests that the relationship between arousal and performance follows an inverted-U function (Yerkes & Dodson, 1908), which in older adults may be shifted or compressed (Mather & Harley, 2016). Consequently, levels of physical or cognitive effort that would optimize arousal in younger adults might push older adults into a supra-optimal state (the “descending limb” of the curve), where excessive norepinephrine release leads to distractibility, indiscriminate processing, and performance decrements (Aston-Jones & Cohen, 2005; Eldar et al., 2013).

Pupillometry provides a powerful, non-invasive window into these LC-NE dynamics. Because pupil diameter tracks LC firing activity with high temporal precision (Joshi et al., 2016; Murphy et al., 2014), it serves as a proxy for both tonic and phasic arousal states.

  • Baseline Pupil Diameter: Reflects tonic LC activity and general alertness levels (Gilzenrat et al., 2010).

  • Task-Evoked Pupil Response (TEPR): Reflects phasic LC activation and the mobilization of mental effort during task execution (Beatty, 1982). Specifically, the amplitude and latency of the TEPR—often quantified as the Area Under the Curve (AUC)—have been linked to the subjective difficulty of a task and the cognitive resources recruited to perform it (Kahneman & Beatty, 1966; Van Gerven et al., 2004).

Crucially, recent computational work has begun to bridge the gap between these physiological arousal signals and the latent cognitive processes of decision-making. In a seminal study, Cavanagh et al. (2014) demonstrated that eye tracking and pupillometry serve as indicators of dissociable latent decision processes. By applying hierarchical Bayesian DDM to a probabilistic learning task, they found that while gaze dwell time predicted the rate of evidence accumulation (drift rate), pupil dilation specifically predicted an increase in the decision threshold (\(a\)) during high-conflict choices. This finding fundamentally reframed the role of phasic arousal in decision-making: rather than merely “energizing” the system generally, the pupil-linked arousal response can act as a specific signal for cognitive control, prompting a “hold your horses” mechanism (Frank, 2006) that raises the decision boundary to prevent impulsive errors. This link between pupil dilation and threshold adjustment (\(a\)) has since been corroborated by others, suggesting it may be a general marker of decision uncertainty and conflict monitoring (Urai et al., 2017; Wel & Steenbergen, 2018).

This connection is vital for the present study. By integrating pupillometry with DDM, we can move beyond simple behavioral outcomes to ask mechanistic questions about how physical effort impacts the aging brain. Does the physical arousal from a high-effort handgrip act as a beneficial boost that sharpens neural gain (increasing drift rate, \(v\)), as Adaptive Gain Theory might predict for moderate arousal? Or, does it trigger a conflict signal that prompts older adults to become more conservative (increasing threshold, \(a\)), as suggested by the work of Cavanagh et al. (2014)? Alternatively, if the effort pushes older adults into a supra-optimal state, does the pupil signal reflect internal noise that degrades evidence quality (decreasing \(v\))? This combined pupil-DDM approach allows us to directly test these competing hypotheses by linking observable physiological states to the latent computational components of the decision process.

1.4.1 Hypotheses and Predictions

Grounded in the behavioral and physiological frameworks above, we test four preregistered predictions:

  1. Drift rate (\(v\)) will decrease under high effort (40% MVC) relative to low effort, reflecting degraded evidence accumulation from resource competition or supra-optimal arousal.
  2. Boundary separation (\(a\)) will increase under high effort, reflecting the conflict-control signal suggested by phasic pupil-linked arousal and older adults’ strategic caution.
  3. Non-decision time (\(t_0\)) may increase modestly under high effort, reflecting cognitive-motor interference during concurrent grip maintenance.
  4. Starting bias (\(z\)) may move toward 0.5 if high-effort trials evoke strong phasic LC-NE responses that “reset” pre-existing response tendencies (Gee et al., 2020), with the magnitude of any shift moderated by LC integrity (Huang & Clewett, 2024).

1.5 Conclusion and Overview of the Present Study

In this chapter, we investigate how effort-induced arousal modulates decision-making in older adults at a computational level. By applying hierarchical Bayesian drift-diffusion modeling (HDDM) to behavioral data from older participants under low-effort vs. high-effort conditions, we test whether heightened arousal degrades evidence accumulation, triggers compensatory increases in boundary separation, alters bias, or slows non-decision processes. By decomposing older adults’ performance with the DDM, we can pinpoint the locus of the arousal effect: is it degrading the evidence itself, shifting the strategic criterion, resetting biases, or altering peripheral processing speeds? This approach allows us to move beyond simple outcome measures (mean reaction time or accuracy) to visualize the mechanism of how a physiologically loaded aging brain arrives at a choice. The broader significance of this work lies in understanding whether the characteristic cautiousness and processing inefficiencies of older decision-makers are fixed traits or dynamic features modulated by physiological state. If arousal can “tune” decision parameters in predictable ways, it suggests that decision performance in aging is not static but state-dependent. If excessive effort proves detrimental as hypothesized, it highlights the critical importance of effort regulation and stress management for older individuals in demanding environments. Ultimately, by mathematically decomposing these effects, we seek to clarify whether boosting arousal in older adults helps “overclock” their decision processes or instead exacerbates underlying capacity limits. This knowledge contributes to a more comprehensive theory of cognitive aging—one that accounts for both the baseline architectural changes of the brain and the dynamic, moment-to-moment influence of internal physiological states.

2 Methods

2.1 DDM Implementation

2.1.2 Hierarchical Structure

For subject \(i\) and trial \(j\), the model parameters are (Bürkner, 2017; Ratcliff & McKoon, 2008):

\[v_{ij} = \beta_{v,0} + \sum_k \beta_{v,k} X_{k,ij} + u_{v,i} \tag{3}\]

\[a_{ij} = \exp\left(\beta_{\text{bs},0} + \sum_k \beta_{\text{bs},k} X_{k,ij} + u_{\text{bs},i}\right) \tag{4}\]

\[t_{0,ij} = \exp\left(\beta_{\text{ndt},0} + \sum_k \beta_{\text{ndt},k} X_{k,ij}\right) \tag{5}\]

\[z_{ij} = \text{logit}^{-1}\left(\beta_{\text{bias},0} + \sum_k \beta_{\text{bias},k} X_{k,ij} + u_{\text{bias},i}\right) \tag{6}\]

where \(\beta_{0}\) are population-level intercepts, \(\beta_k\) are population-level coefficients for predictors \(X_k\) (e.g., task, effort condition, difficulty level), and \(u_i \sim \mathcal{N}(0, \sigma^2_u)\) are subject-level random effects. Note that \(t_0\) is modeled without subject-level random effects to maintain model stability in the response-signal design.

2.1.3 Likelihood Function

The likelihood for a single trial with RT \(t\) and decision \(d \in \{\text{"same"}, \text{"different"}\}\) follows the Wiener first-passage time distribution (Feller, 1968):

\[p(t, d | v, a, t_0, z) = \text{Wiener}(t - t_0 | v, a, z) \tag{7}\]

where the Wiener distribution gives the probability density of the first-passage time to boundary \(d\) given drift \(v\), boundary separation \(a\), and starting point \(z \cdot a\).

2.2 Decision Coding

We employed response-side coding (also called “stimulus coding” or “response coding”) where the upper boundary corresponds to “different” responses and the lower boundary corresponds to “same” responses (see Figure 1), rather than accuracy-based coding where boundaries represent correct/incorrect responses. This specification is necessary to disentangle response bias (a preference for one response alternative regardless of stimulus truth) from discriminability (drift rate) (Ratcliff & McKoon, 2008; Wiecki et al., 2013).

In accuracy-based coding, bias would imply a preference for being correct (which is conceptually trivial), whereas response-side coding allows us to model the meaningful preference for the “same” response option observed in detection tasks. This is particularly important for same/different discrimination tasks where participants often exhibit specific response biases (e.g., a conservative “same” bias) rather than general accuracy biases. Previous work linking arousal to decision-making has demonstrated that phasic arousal reduces response biases in detection tasks (Gee et al., 2020), and capturing this effect requires mapping boundaries to response alternatives.

On Standard (Δ=0) trials, participants chose “same” on 89.1% of trials and “different” on 10.9%—consistent with a conservative response tendency. The inclusion of Standard trials provides a critical constraint for estimating bias. While Standard trials theoretically have zero objective evidence difference (Δ=0), our model estimated a strong negative drift rate (v ≈ -1.26 in the primary model), indicating that participants actively accumulate evidence toward “same” responses when stimuli are identical. The observed preference for “same” responses reflects the combined effects of both drift and starting-point bias, with drift dominating the decision process. This bias estimate would be unobtainable using accuracy-coded models, where Standard trials would be ambiguous (both “same” and “different” responses are technically correct when stimuli are identical). Response-side coding was implemented directly from the raw data using the resp_is_diff column, which explicitly records whether each trial was a “different” response (TRUE) or “same” response (FALSE), ensuring accurate mapping to DDM boundary assignments.

2.3 Computational Methods

All analyses were performed using R version 4.5.2 (2025-10-31) “[Not] Part in a Rumble” (R Core Team, 2025) on macOS (aarch64-apple-darwin20, Apple Silicon). Bayesian hierarchical models were fitted using brms (Bürkner, 2017, 2018) with CmdStan (Stan Development Team, 2024) via cmdstanr (Gabry & Češnovar, 2021) as the backend. Model comparison was conducted using leave-one-out cross-validation via the loo package (Vehtari et al., 2017). Data manipulation and visualization used dplyr (Wickham, François, et al., 2023), tidyr (Wickham, Vaughan, et al., 2023), readr (Wickham et al., 2024), and ggplot2 (Wickham, 2016). Tables were generated using gt (Iannone et al., 2024). Posterior analysis and diagnostics used the posterior package (Bürkner et al., 2022). Code development and debugging were performed using Cursor (AI-assisted code editor), and the document was rendered using Quarto (Allaire et al., 2022).

MCMC Sampling Specifications:

  • Algorithm: NUTS (No-U-Turn Sampler)
  • Chains: 4
  • Iterations: 8,000 per chain (4,000 warmup, 4,000 sampling)
  • Convergence criteria: \(\hat{R}\) ≤ 1.01 (Gelman & Rubin, 1992; Vehtari et al., 2021), minimum bulk/tail ESS ≥ 400

2.4 Sample & Experimental Design

2.4.1 Participants

67 older adults (≥65 years; mean age = 71.3 years, SD = 4.8). This analysis uses the same dataset and participants as described in the LC behavioral report manuscript (see References). All participants provided informed consent and received course credit or financial compensation for participation. Study procedures were approved by the Institutional Review Board of the University of California, Riverside and all experimental procedures were performed in accordance with the approved guidelines and regulations.

Note: 12 participants performed at or below chance (≤55%) in some conditions but were retained to maximize sample size, as hierarchical modeling borrows strength to stabilize their estimates. Sensitivity analyses confirmed their inclusion did not alter main effects.

2.4.2 Tasks and Conditions

Tasks: Auditory Detection Task (ADT) and Visual Detection Task (VDT) were modeled jointly with ‘task’ as a fixed effect. This approach uses a single random effect variance parameter for subject-level variability across both tasks, allowing the model to share information between tasks and stabilize subject-specific estimates through hierarchical shrinkage while estimating task-specific offsets. [Detailed task descriptions, stimulus parameters, and equipment specifications are provided in the LC behavioral report manuscript; see References.]

Conditions (within-subjects, factorial design):

  • Difficulty: Standard (Δ=0), Easy, Hard
  • Effort: Low (5% MVC), High (40% MVC)

Total design cells: 2 tasks × 3 difficulty levels × 2 effort conditions = 12 cells per subject.

Total trials analyzed: 17,834 (after exclusions). Standard trials: 3,597 (20.2%).

2.5 Trial Timeline (Response-Signal Design)

Task design. Both Auditory and Visual tasks followed the same trial structure, but differed in stimuli. Each trial began with a 1.5-4.5 s jittered blank gray screen, followed by a 3 s grip gauge instructing low (5% MVC) or high (40% MVC) force with on-screen feedback. After a 0.25 s blank and 0.5 s fixation, a stimulus pair was presented. Auditory task: a 1000 Hz 0.1 s standard tone, a 0.5 s ISI, then a 0.1 s comparison tone that was either identical or increased by 8, 16, 32, or 64 Hz. Visual task: a central Gabor (1.5 cycles/degree, 0.2 Michelson contrast, 4°) followed by a 0.5 s blank and a second Gabor that was identical or increased in contrast by 0.06, 0.12, 0.24, or 0.48. Participants then released the grip and had 3 s to report “same” or “different,” followed by 3 s to rate their confidence on their same/different choice (1-4, low to high). All 5 stimulus levels and the 2 grip levels were presented in equal proportions across 150 trials per task, and were presented pseudorandomly.

RT definition: Time from response-screen onset (response-signal design). This is a critical methodological detail: RTs are measured from when the response screen appears (after the stimulus presentation period), not from stimulus onset. This design constrains the interpretation of t₀ (non-decision time) to primarily reflect motor execution and response selection rather than the sum of encoding + motor time as in traditional RT tasks. The response-signal design rationale is described in detail in the LC behavioral report manuscript.

Filtering: The DDM analysis applies a 250 ms lower-bound cutoff for anticipatory responses. While a 150–200 ms cutoff is standard for young adult populations (Whelan, 2008), research consistently demonstrates that older adults exhibit significantly longer non-decision times (\(T_{er}\)), reflecting age-related slowing in stimulus encoding and motor execution. Specifically, drift diffusion modeling in aging populations estimates that \(T_{er}\) is approximately 80–100 ms longer in older adults compared to their younger counterparts (Ratcliff et al., 2001, 2004). Consequently, a 250 ms threshold provides a conservative lower bound that adjusts for this physiological shift, ensuring that excluded trials represent genuine non-decisional reflexes rather than the leading edge of the valid decision distribution (Woods et al., 2015). During preprocessing, trials with RT < 200 ms were excluded (see Trial Exclusions section below). No additional trials were excluded at the 250 ms threshold as all remaining trials had RT ≥ 250 ms. The upper bound of 3.000 s reflects the maximum response window in the task design; no upper-bound filtering was applied post-experiment.

2.5.1 Data Quality Assurance

2.5.1.1 Trial Exclusions

Trial exclusions were applied during data preprocessing. The following table summarizes exclusions by filter type:

Trial Exclusions Summary
Filter Applied Trials Remaining Trials Removed % Remaining % Removed
Starting trials 19,740 0 100.00 0.00
RT < 200 ms1 1 19,495 1 245 1 98.76 1 1.24
Missed responses 19,194 301 97.23 2.77
Invalid run performance 16,958 2,236 85.91 14.09
Final trials (Preprocessing) 16,958 2,782 85.91 14.09
Restored (Audit) 17,243 −285 87.35 12.65
Final Analysis N 17,243 0 87.35 12.65
1 RT < 200 ms: Anticipatory responses excluded during preprocessing. The DDM analysis used a 250 ms cutoff, but no additional trials were excluded. 285 trials were restored after a decision coding audit confirmed their validity.

Summary: Of 19,740 starting trials, 2,782 trials (14.1%) were excluded during preprocessing: - 245 trials (1.2%) excluded for RT < 200 ms (anticipatory responses) - 301 trials (1.5%) excluded for missed responses - 2,236 trials (11.3%) excluded for invalid run performance

Final dataset after preprocessing: 16,958 trials (85.9% retention) from 65 subjects. Additional data processing steps (e.g., decision coding verification, quality checks) resulted in the final analysis dataset of 17,834 trials from 67 subjects. Two additional subjects were included after verification of their data quality during the decision coding audit. The dataset was updated to use the latest raw behavioral data file (bap_beh_trialdata_v2.csv) with direct response-side coding from the resp_is_diff column. Note: The DDM analysis applies a more conservative 250 ms lower-bound cutoff (see Filtering section above) based on age-related non-decision time shifts, but no additional trials were excluded as all remaining trials had RT ≥ 250 ms.

2.5.2 Subject Inclusion & Decision Coding Audit

**Subject Inclusion:**
- Total subjects: 67 
- Sub-chance performers (≤55% accuracy): 12 
- Mean overall accuracy: 63.3% 
**Decision Coding Audit:**
- Total trials: 17,243 
- Decision coding mismatches: 0 
- Mismatch rate: 0.0000 

Result: All 67 subjects were retained; no sub-chance performers were excluded. Decision coding verification confirmed zero mismatches across all trials. Decision coding methodology is discussed in detail in the Decision Coding section above.

2.5.3 Manipulation Checks

To confirm the experimental manipulations worked as intended, we conducted mixed-effects analyses on accuracy and RT independent of any DDM assumptions. Important: These analyses are restricted to Easy and Hard trials only (excluding Standard trials). Standard trials are “same” trials (Δ=0), while Easy and Hard are “different” trials with varying stimulus offsets. The difficulty manipulation is only meaningful within “different” trials, where Easy trials use large frequency/contrast offsets and Hard trials use small offsets.

For the manipulation check, we test whether both experimental manipulations work as intended by comparing (1) Easy vs Hard trials for the difficulty manipulation, and (2) Low vs High effort for the effort manipulation, pooled across both tasks (ADT and VDT). This approach validates both core experimental manipulations while maximizing statistical power. Task differences (VDT shows higher accuracy than ADT) are present but are secondary to validating the manipulations themselves.

2.5.3.1 Accuracy: Generalized Linear Mixed Model

Model: decision ~ difficulty + effort + (1 | subject), restricted to Easy and Hard trials only (N = 13,771 trials, pooled across ADT and VDT). Reference levels: Easy, Low_5_MVC.

Accuracy GLMM Results
Term β SE statistic p 95% CI
Intercept 2.06 0.12394639 16.580915 <.001 [1.81, 2.30]
Difficulty: Hard -2.97 0.04993635 -59.539932 <.001 [-3.07, -2.88]
Effort: High -0.15 0.04456074 -3.298378 <.001 [-0.23, -0.06]

Key findings:

  • Hard vs. Easy: Hard trials showed substantially lower accuracy than Easy (β = -2.97, p < .001). Easy trials had 85.2% accuracy, while Hard trials had 30.5% accuracy (well below chance). This reflects the increased difficulty of detecting small frequency/contrast differences on “different” trials, demonstrating a strong effect of stimulus difference magnitude on discrimination performance.
  • High vs. Low Effort: High effort (40% MVC) showed slightly lower accuracy than Low effort (5% MVC) (β = -0.15, p = .001). Low effort had 58.5% accuracy, while High effort had 56.8% accuracy. This suggests that the increased physical effort required for High effort trials may interfere with cognitive performance, potentially due to dual-task resource competition between maintaining grip force and performing the discrimination task.

2.5.3.2 RT: Linear Mixed Model on Median RT

Model: rt_median ~ difficulty + effort + (1 | subject), restricted to Easy and Hard trials only (N = 13,771 trials, pooled across ADT and VDT). Reference levels: Easy, Low_5_MVC.

RT LMM Results
Term β (seconds) SE statistic 95% CI
Intercept 0.792 0.03352639 23.6357812 [0.727, 0.858]
Difficulty: Hard 0.232 0.01743680 13.2841313 [0.197, 0.266]
Effort: High 0.016 0.01747072 0.9164804 [-0.018, 0.050]

Key findings:

  • Hard vs. Easy: Hard trials were slower than Easy (β = 0.23 s, 95% CI [0.20, 0.27]). Easy trials had a median RT of 0.75 s (mean 0.90 s), while Hard trials had a median RT of 1.01 s (mean 1.12 s). This reflects slower decision-making when stimulus differences are smaller and harder to detect.
  • High vs. Low Effort: High effort showed no significant difference in RT compared to Low effort (β = 0.02 s, 95% CI [-0.02, 0.05]). Low effort had a median RT of 0.86 s (mean 1.00 s), while High effort had a median RT of 0.89 s (mean 1.02 s). The effort manipulation did not significantly affect reaction time, suggesting that the dual-task demands primarily affected accuracy rather than response speed.

Conclusion: Both experimental manipulations worked as intended. The difficulty manipulation (Easy vs. Hard within “different” trials) showed strong effects on both accuracy and RT in theoretically expected directions: larger stimulus differences (Easy) led to higher accuracy (85.2% vs. 30.5%) and faster RTs (0.75 s vs. 1.01 s median) compared to smaller differences (Hard). The effort manipulation (Low vs. High MVC) showed a small but significant effect on accuracy, with High effort slightly reducing accuracy (56.8% vs. 58.5%), likely due to dual-task resource competition. However, effort did not significantly affect RT. These results validate the experimental design prior to DDM analysis.

2.6 Model Specifications

2.6.1 Standard-Only Bias Calibration Model

To isolate bias identification from drift, we fit a single hierarchical Wiener DDM to Standard trials only (3,597 trials from 67 subjects). The model uses parameter-specific formulas to specify how predictors map onto each DDM parameter:

  • Drift (v): rt | dec(dec_upper) ~ 1 + (1|subject_id) with relaxed prior normal(0, 2) to allow for potential negative drift
  • Boundary (a/bs): bs ~ 1 + (1|subject_id) — intercept + subject random effects
  • Non-decision time (t₀/ndt): ndt ~ 1 — intercept-only (response-signal design)
  • Bias (z): bias ~ task + effort_condition + (1|subject_id) — task/effort effects + subject random effects

Note: These formulas are all part of one model fitted simultaneously. The bf() function in brms allows specification of separate formulas for each DDM parameter (drift, boundary, non-decision time, bias) within a single hierarchical model.

Drift prior rationale: While Standard trials theoretically have zero evidence (Δ=0), we used a relaxed drift prior (normal(0, 2)) rather than a tight prior to allow the model to capture any systematic drift patterns that might emerge from the data. This approach recognizes that even on Standard trials, participants may accumulate evidence toward “same” responses, which is consistent with the observed 89.1% “same” response rate. A tight prior forcing drift to zero would be inappropriate if participants are systematically accumulating evidence toward one boundary. The relaxed prior allows the model to estimate drift and bias jointly, with both parameters contributing to the observed choice proportions.

2.6.2 Joint Confirmation Model

A full hierarchical model using all trials (17,834 trials) constrained Standard drift to ≈0 (tight prior normal(0, 0.04)) and allowed drift differences only for non-Standard trials (Easy/Hard) via an is_nonstd indicator:

  • Drift (v): rt | dec(dec_upper) ~ 0 + difficulty_level + task:is_nonstd + effort_condition:is_nonstd + (1|subject_id) — separate coefficients per difficulty, task/effort effects only for non-Standard
  • Boundary (a/bs): bs ~ difficulty_level + task + (1|subject_id) — difficulty + task effects + subject random effects
  • Non-decision time (t₀/ndt): ndt ~ task + effort_condition — task/effort effects, no random effects
  • Bias (z): bias ~ difficulty_level + task + (1|subject_id) — difficulty + task effects + subject random effects

This joint model confirms the bias estimates from the Standard-only model while providing additional information about difficulty effects.

2.6.3 Primary Analysis Model

The primary model is a single hierarchical Wiener DDM that includes difficulty effects on v, a, and z, with task and effort as additive factors. The model uses parameter-specific formulas:

  • Drift (v): rt | dec(dec_upper) ~ difficulty_level + task + effort_condition + (1 + difficulty_level | subject_id)
  • Boundary (a/bs): bs ~ difficulty_level + task + (1 | subject_id)
  • Non-decision time (t₀/ndt): ndt ~ task + effort_condition (no random effects)
  • Bias (z): bias ~ difficulty_level + task + (1 | subject_id)

Note: These formulas are all part of one model fitted simultaneously. The bf() function in brms allows specification of separate formulas for each DDM parameter within a single hierarchical model. The dec_upper variable (1 = “different”, 0 = “same”) is directly extracted from the raw data resp_is_diff column, ensuring accurate response-side coding for boundary assignments.

Rationale for ndt formula: In the response-signal design, t₀ primarily reflects motor execution. To avoid identifiability issues and maintain model stability, we modeled t₀ with group-level task and effort effects only, omitting subject-level random effects. The response-signal task design and its implications for DDM parameter interpretation are described in the LC behavioral report manuscript (see References).

2.6.4 Priors

All priors are weakly informative and set on the link scale:

Intercepts:

  • v Intercept ~ Normal(0, 1)
  • bs Intercept ~ Normal(log(1.7), 0.30) → a ≈ 1.7 on natural scale
  • ndt Intercept ~ Normal(log(0.23), 0.12) → t₀ ≈ 230 ms on natural scale
  • bias Intercept ~ Normal(0, 0.5) → z ≈ 0.5 (no bias) on probability scale

Slopes:

  • v slopes: Normal(0, 0.6–0.7)
  • bs slopes: Normal(0, 0.25–0.30)
  • bias slopes: Normal(0, 0.35)

Random effects:

  • Standard deviations: Student-t(3, 0, 0.30)
  • Correlations: LKJ(2)

Sampling controls: NUTS with adapt_delta = 0.995, max_treedepth = 15. Four chains, 8,000 iterations (4,000 warmup).

2.6.4.1 Prior vs. Posterior for Non-Decision Time

t₀ Prior vs Posterior. Prior (gray line) and posterior (blue shaded density) distributions for the t₀ intercept. The prior is Normal(log(0.23), 0.12) on the log scale (≈0.23 s on natural scale). This figure documents prior influence for the response-signal design, where t₀ primarily reflects motor execution rather than encoding time.

Interpretation: The posterior for t₀ is well-informed by the data while remaining compatible with the weakly informative prior, confirming adequate identifiability for the group-level intercept despite the response-signal design.

2.7 Model Comparison & Diagnostics

2.7.1 Model Comparison

We compared 10 candidate models varying in how difficulty, task, and effort map onto DDM parameters. Leave-one-out cross-validation (LOO-CV) was used to select the best-fitting model.

2.7.1.1 LOO Summary

Model Comparison: LOO-CV Results
Model ELPD SE P_loo
Difficulty → v + a + z -17007.01 148.39 192.35

Winner: The model with difficulty → (v + a + z) is strongly favored.

  • ΔELPD vs. v-only: ≈ +185 (SE ≈ 20)
  • Stacking weight: ≈ 0.89
  • PBMA weight: ≈ 1.0

Pareto-k diagnostics: 1/17,834 observations had k > 0.7; moment matching was not required.

Model Comparison: Leave-One-Out Cross-Validation. ELPD (Expected Log-Predictive Density) with 95% SE bars by model. The best model (highest ELPD) is indicated with a dashed red line. ΔELPD values (difference from best) are annotated above each point. Larger ELPD indicates better out-of-sample predictive accuracy.

Interpretation: The data strongly support a model in which task difficulty modulates drift rate, boundary separation, and starting-point bias simultaneously. Simpler models (e.g., difficulty affecting only drift) are decisively rejected by cross-validation.

2.7.2 Model Diagnostics

Convergence & PPC Gate (Primary Model)
model_file timestamp conv_max_rhat conv_min_bulk_ess conv_min_tail_ess conv_divergences conv_pass loo_elpd loo_se loo_max_pareto_k loo_n_high_k ppc_subj_n_cells ppc_subj_n_flagged_qp ppc_subj_n_flagged_ks ppc_subj_n_flagged_midbody ppc_subj_n_flagged_any ppc_subj_pct_flagged_qp ppc_subj_pct_flagged_ks ppc_subj_pct_flagged_midbody ppc_subj_pct_flagged_any ppc_subj_max_qp ppc_subj_max_ks ppc_subj_max_midbody ppc_subj_median_acc ppc_subj_pass ppc_cond_n_flagged ppc_cond_pct_flagged ppc_cond_max_qp ppc_cond_max_ks gate_pass
fit_primary_vza_vEff_censored.rds 2025-11-19 13:07:23 1.003 804.755 NA 0 TRUE -14758.47 147.406 NA NA 12 12 12 12 12 100 100 100 100 0.356 0.318 0.234 0.815 FALSE 12 100 0.187 0.363 FALSE

Convergence criteria:

  • Max \(\hat{R}\) ≤ 1.01 ✓
  • Min bulk ESS ≥ 400 ✓
  • Min tail ESS ≥ 400 ✓
  • Divergent transitions = 0 ✓

PPC thresholds (pre-declared):

  • Subject-wise mid-body QP RMSE ≤ 0.09 s
  • |Δ accuracy| ≤ 0.05
  • KS statistic ≤ 0.15
  • ≤ 15% of cells flagged

Result: The primary model passes all MCMC convergence gates (\(\hat{R}\), ESS, divergent transitions). PPC performance is discussed in detail below.

3 Results

3.1 Bias Estimates (Standard-Only Model)

With the relaxed drift prior, the Standard-only bias model estimated a negative drift rate on Standard trials (posterior mean v = -1.404, 95% CrI [-1.662, -1.147]), indicating that participants actively accumulated evidence toward the “same” response option. The starting-point bias was slightly above 0.5 (no bias), with posterior mean z = 0.567, 95% CrI [0.534, 0.601], indicating a slight bias toward “different” responses. However, the strong negative drift dominates the decision process, resulting in the observed high proportion (89.1%) of “same” responses. This pattern suggests that the conservative response strategy is driven by evidence processing (perceiving sameness as a signal) rather than a simple shift in starting point. VDT showed less bias toward “different” than ADT on the logit scale, with contrast Δ = -0.179, 95% CrI [-0.259, -0.101], P(Δ>0) < 0.001, indicating modality-specific differences in response bias. Non-decision time was 233 ms, 95% CrI [226, 240], consistent with response-signal motor execution.

Bias Levels (z parameter, natural scale)
Condition Mean 2.5% 97.5%
ADT, Low effort 0.573 0.540 0.604
ADT, High effort 0.580 0.547 0.612
VDT, Low effort 0.534 0.501 0.566
VDT, High effort 0.541 0.509 0.573
Bias Contrasts (Standard-Only Model)
Contrast Mean Δ (logit) 2.5% 97.5% P(Δ>0)
VDT vs. ADT (bias, logit) -0.157 -0.232 -0.081 0

Starting-Point Bias (z) by Task Modality. Bar plot showing mean bias estimates with 95% credible intervals (error bars) for ADT and VDT tasks. Values above 0.5 indicate bias toward ‘Different’ (upper boundary). ADT shows higher bias (z = 0.573, 95% CrI: [0.540, 0.604]) compared to VDT (z = 0.534, 95% CrI: [0.501, 0.566]). This task-specific difference in starting-point bias suggests that participants adopt different decision criteria for auditory versus visual discrimination tasks.

The Standard-only bias calibration model (with relaxed drift prior) estimated a negative drift rate on Standard trials (posterior mean v = -1.404, 95% CrI [-1.662, -1.147]), indicating that participants actively accumulated evidence toward “same” responses on Standard trials, consistent with the observed 89.1% “same” response rate. The primary model (see Difficulty Effects section) estimated similar negative drift for Standard trials (v ≈ -1.26), confirming this pattern across both models.

3.2 Fixed Effects

3.2.1 Forest Plots by Task

Fixed Effects: ADT (Auditory Detection Task). Posterior means (link scale) with 95% CrIs for drift (v), boundary separation (a/bs), and starting-point bias (z). In the additive model, difficulty and effort contrasts are identical for both tasks; only the intercepts differ.

Fixed Effects: VDT (Visual Detection Task). Posterior means (link scale) with 95% CrIs for drift (v), boundary separation (a/bs), and starting-point bias (z). In the additive model, difficulty and effort contrasts are identical for both tasks; only the intercepts differ.

3.2.2 Summary Table

Table: Fixed Effects Summary (Link Scale)
Parameter Mean 2.5% 97.5% Rhat ESS Bulk
Bias (z): ADT 0.268 0.188 0.348 1.00 14,429
Bias (z): VDT 0.206 0.127 0.287 1.00 14,429
Boundary (a): ADT 0.822 0.770 0.873 1.00 18,206
Boundary (a): VDT 0.762 0.711 0.813 1.00 18,206
Drift (v): ADT -1.260 -1.365 -1.158 1.00 15,672
Drift (v): VDT -1.117 -1.224 -1.013 1.00 15,672
Drift (v): Intercept -1.260 -1.365 -1.158 1.00 2,165
Non-decision time (t₀): ADT -1.536 -1.556 -1.518 1.00 17,447
Non-decision time (t₀): VDT -1.500 -1.517 -1.484 1.00 17,447
Bias (z): Intercept 0.268 0.188 0.348 1.00 4,765
Bias (z): Difficulty: Easy -0.078 -0.145 -0.011 1.00 10,652
Bias (z): Difficulty: Hard -0.050 -0.113 0.011 1.00 10,589
Boundary (a): Intercept 0.822 0.770 0.873 1.00 1,152
Boundary (a): Difficulty: Easy -0.131 -0.153 -0.109 1.00 14,448
Boundary (a): Difficulty: Hard -0.066 -0.086 -0.047 1.00 14,230
Drift (v): Difficulty: Easy 2.170 2.108 2.232 1.00 10,789
Drift (v): Difficulty: Hard 0.616 0.558 0.674 1.00 10,598
Drift (v): Effort: High_40_MVC -0.052 -0.084 -0.020 1.00 21,386
Non-decision time (t₀): Intercept -1.536 -1.556 -1.518 1.00 15,848
Non-decision time (t₀): Effort: High_40_MVC 0.023 0.006 0.039 1.00 22,738

3.3 Parameter Contrasts

Table: Posterior Contrasts (Directional Probabilities)
Contrast Parameter Mean Δ 2.5% 97.5% P(Δ>0) P(Δ<0) P(in ROPE)1
Difficulty: Easy bias -0.078 -0.145 -0.011 0.012 0.988 0.210
Difficulty: Hard bias -0.050 -0.113 0.011 0.055 0.945 0.494
Difficulty: Easy bs -0.131 -0.153 -0.109 0.000 1.000 0.000
Difficulty: Hard bs -0.066 -0.086 -0.047 0.000 1.000 0.049
effort_conditionHigh_40_MVC ndt 0.023 0.006 0.039 0.996 0.004 1.000
Easy (absolute) v 0.910 0.811 1.008 1.000 0.000 0.000
Easy vs. Hard v 1.554 1.504 1.604 1.000 0.000 0.000
Easy vs. Standard v 2.170 2.108 2.232 1.000 0.000 0.000
Hard (absolute) v -0.643 -0.740 -0.546 0.000 1.000 0.000
Hard vs. Standard v 0.616 0.558 0.674 1.000 0.000 0.000
High vs. Low v -0.052 -0.084 -0.020 0.001 0.999 0.026
1 ROPE (Region of Practical Equivalence): |Δ| < 0.02 for drift (v), |Δ| < 0.05 for boundary (bs) and bias (z) on link scales.

Key contrasts interpreted:

  • Easy vs. Hard on drift (v): Strong positive effect in both tasks (P(Δ>0) > 0.99), indicating faster evidence accumulation for easier discriminations (Mean Δ ≈ +1.50 units/s).
  • Easy vs. Hard on boundary (a): Negative effect (Mean Δ ≈ -0.04 on log scale, or ~4% reduction), consistent with reduced caution.
  • Task differences: VDT shows systematically different parameter values than ADT, supporting task-specific processing.
  • Effort on drift and t₀: High effort shows small but credible effects on information accumulation and motor execution time (t₀ increase of ~0.03 log-units or ~7.5 ms).

3.4 Individual Differences and Parameter Relationships

3.4.1 Subject-Level Parameter Distribution

The hierarchical structure of our model allows us to examine individual differences in DDM parameters across participants. Subject-level random effects capture how each participant deviates from the group-level mean for each parameter.

Subject-Level Parameter Distribution. Violin plots showing the distribution of subject-level random effects (deviations from group mean) for each DDM parameter. The boxplots within each violin show the median and interquartile range. Zero (dashed line) represents the group mean. This visualization highlights the substantial individual variability in decision-making parameters across our sample of 67 older adults.

Interpretation: The distributions reveal substantial individual differences in all DDM parameters. Drift rate (v) shows the widest variability, consistent with the heterogeneity in evidence accumulation speed observed in aging populations. Boundary separation (a) and bias (z) also show meaningful individual variation, supporting the use of hierarchical modeling to account for between-subject differences.

3.4.2 Parameter Correlations

Understanding the relationships between DDM parameters is crucial for interpreting how decision-making components covary. Parameter correlations reveal trade-offs and dependencies that may reflect underlying cognitive strategies.

Subject-Level Parameter Correlation: Drift Intercept vs. Bias Intercept. Scatter plot showing the relationship between subject-level drift intercept and bias intercept estimates. Each point represents one subject (n = 67). The correlation is r = -0.205, indicating a weak negative relationship: subjects with stronger negative drift (toward ‘Same’) tend to have slightly lower bias (closer to 0.5, less bias toward ‘Different’). The regression line with 95% confidence interval (shaded ribbon) shows the trend. This pattern suggests that participants who are more sensitive to stimulus identity (stronger negative drift) may also adopt more neutral starting points, potentially reflecting a more conservative decision strategy.

Interpretation: The scatter plot reveals a weak negative correlation (r = -0.205) between drift intercept and bias intercept at the subject level. This suggests that participants who show stronger negative drift (better at detecting “sameness”) tend to have less bias toward “Different” responses. While the relationship is modest, it indicates that individual differences in evidence accumulation may be related to differences in starting-point bias, potentially reflecting strategic adaptations in decision criteria across participants.

3.4.3 Integrated Condition Effects

To provide a comprehensive view of how experimental manipulations affect all DDM parameters simultaneously, we present an integrated visualization of condition effects across parameters.

Integrated Condition Effects on DDM Parameters. Multi-panel forest plot showing posterior means with 95% credible intervals for difficulty and effort effects on each DDM parameter (drift rate v, boundary separation a, bias z, and non-decision time t₀). All effects are shown on their respective link scales. This integrated view allows direct comparison of effect magnitudes and directions across parameters, revealing that difficulty primarily affects drift rate and boundary separation, while effort shows smaller effects on drift and non-decision time. Note: Difficulty and effort effects are identical across ADT and VDT due to the additive model structure, so a single combined plot is shown.

Interpretation: The integrated plots reveal that difficulty effects are strongest for drift rate (v) and boundary separation (a), with Easy trials showing faster evidence accumulation and reduced caution relative to Hard trials. Effort effects are more modest but consistent across parameters, with High effort reducing drift rate and increasing non-decision time. These effects are consistent across both ADT and VDT, supporting the additive model structure where difficulty and effort effects are shared across tasks, with only intercepts differing between modalities.

3.4.4 Brinley Plot: Reaction Time Relationships

Brinley plots are a classic visualization in cognitive aging research that reveal generalized slowing patterns by plotting RT in one condition against RT in another condition (Brinley, 1965). The slope of the regression line indicates the degree of generalized slowing, with slopes > 1 indicating disproportionate slowing in more difficult conditions.

Brinley Plot: Reaction Time Relationships. Scatter plot showing mean RT for Hard trials (y-axis) versus Easy trials (x-axis) for each participant, colored by effort condition (blue = Low effort, crimson = High effort). Reaction times are displayed in milliseconds (ms). The dashed diagonal line (slope = 1) represents proportional slowing (equal RT increase in both conditions). The solid black regression line shows the actual relationship with 95% confidence interval (shaded ribbon), with slope > 1 indicating disproportionate slowing in Hard trials relative to Easy trials. Points above the diagonal indicate participants who slowed more in Hard relative to Easy trials. This visualization connects our DDM findings to the broader cognitive aging literature on generalized slowing (Brinley, 1965; Cerella, 1985; Salthouse, 1996).

Interpretation: The Brinley plot reveals a strong positive relationship between Easy and Hard RTs, with a slope > 1 indicating disproportionate slowing in Hard trials—a hallmark of generalized slowing in older adults (Cerella, 1985; Salthouse, 1996). The scatter of points around the regression line reflects individual differences in the magnitude of difficulty effects, consistent with the heterogeneity in drift rate effects observed in our DDM analysis. The separation by effort condition suggests that high effort may exacerbate the difficulty effect for some participants, though this pattern requires further investigation.

3.5 Model Convergence & Selection

All parameters converged well (max \(\hat{R}\) ≤ 1.01; min bulk/tail ESS ≥ 400; no divergent transitions). Leave-one-out cross-validation strongly favored a model in which difficulty modulates drift, boundary separation, and starting-point bias jointly (v+a+z), relative to drift-only or simpler models (ΔELPD ≈ +185, SE ≈ 20).

3.6 Difficulty Effects

Drift rate (v): Easy trials show faster evidence accumulation than Hard trials (strong positive contrast, P(Δ>0) > 0.99 for both tasks).

Drift Rate (v) Across Difficulty Levels. Forest plot showing drift rate estimates with 95% credible intervals for Standard (Δ=0), Hard (Low Signal), and Easy (High Signal) trials. Positive values indicate drift toward ‘Different’ (upper boundary); negative values indicate drift toward ‘Same’ (lower boundary). Standard trials show strong negative drift (v ≈ -1.26), indicating participants actively accumulate evidence toward ‘Same’ responses when stimuli are identical. Hard trials also show negative drift (v ≈ -1.88), explaining the below-chance accuracy (~30%). Easy trials show strong positive drift (v ≈ +1.76), enabling accurate discrimination. This pattern demonstrates the dramatic impact of stimulus discriminability on evidence accumulation.

Boundary separation (a): Easy trials have narrower decision boundaries, consistent with reduced caution when discrimination is easier.

3.7 Task Differences (ADT vs. VDT)

ADT and VDT are separate experimental conditions with distinct parameter profiles. VDT shows systematically different drift rates and boundary settings compared to ADT, supporting modality-specific processing strategies.

3.8 Effort Effects

High effort (40% MVC) produces small but credible effects on drift rate and non-decision time, suggesting that physical effort modulates both information accumulation and motor execution speed.

3.9 Model Fit

Absolute fit: Subject-wise mid-body PPCs show acceptable error magnitudes (QP RMSE ≤ 0.09 s for most cells; ≤15% flagged). The model captures central RT tendencies and accuracy well.

PPC Summary (Joint Model): PPCs were good for Standard and Easy cells (QP RMSE < 0.13, KS < 0.08), with modest misfit in VDT-Hard (worst QP RMSE ≈ 0.206). This pattern suggests some residual fast-tail behavior not captured by a constant-drift Wiener process.

Observed vs. model-predicted p(‘different’) across 12 cells (Task × Effort × Difficulty).

PPC best/median/worst cells (QP RMSE and KS with thresholds).

Known limitation: Pooled conditional PPCs reveal residual fast-tail misfit, most pronounced in Easy/VDT conditions. This is a known limitation of constant-drift Wiener DDMs without across-trial variability (sv, sz, st₀) or explicit contaminant/lapse processes.

3.10 Model Validation: Parameter Consistency and Sanity Checks

To validate the internal consistency of our model estimates, we performed three sanity checks recommended by independent expert review. These checks verify that parameter estimates are mathematically consistent with observed behavioral patterns.

3.10.1 RT Asymmetry on Standard Trials

On Standard trials, the model estimated a strong negative drift rate (v = -1.404) toward “Same” responses, combined with a slight starting-point bias toward “Different” (z = 0.567). To verify the consistency of these estimates, we examined whether RT patterns align with model predictions.

RT Distribution by Response Type on Standard Trials. Density plots showing reaction time distributions for “Same” (crimson) and “Different” (blue) responses on Standard (Δ=0) trials. “Same” responses were 293 ms faster on average (mean RT = 1.03 s) than “Different” responses (mean RT = 1.32 s). This pattern is consistent with strong negative drift: when participants rapidly accumulate evidence for identity, they reach the “Same” boundary quickly, resulting in fast RTs. “Different” responses (errors) occur less frequently and typically require more time, possibly reflecting near-deadline responses when the process fails to reach the “Same” boundary before the response window closes.

Result: “Same” responses were significantly faster than “Different” responses (mean RT: 1.03 s vs. 1.32 s, difference = 293 ms). This pattern aligns perfectly with the model’s prediction: strong negative drift causes rapid evidence accumulation toward “Same”, resulting in fast “Same” responses. The slower “Different” responses likely reflect rare errors that occur when the process fails to reach the “Same” boundary within the response window.

3.10.2 Hard Trial Drift Direction

The primary model estimated that Hard trials have negative drift relative to Standard trials. To verify this estimate is consistent with observed below-chance accuracy on Hard trials, we examined the posterior distribution of Hard trial drift rates.

Posterior Distribution of Hard Trial Drift Rate. Density plot showing the posterior distribution of drift rate (v) for Hard trials. The mean drift rate is -0.643 (95% CrI: [-0.740, -0.546]), with 100% of posterior draws below zero. This confirms that Hard trials have negative drift toward “Same” responses, explaining why participants choose “Same” approximately 70% of the time even when stimuli differ. The weak sensory evidence for difference (small stimulus offsets) is insufficient to overcome the baseline tendency to see stimuli as identical.

Result: Hard trials show consistently negative drift (mean v = -0.643, 95% CrI: [-0.740, -0.546], P(v < 0) = 100%). This confirms that the sensory evidence for difference on Hard trials is too weak to overcome the baseline tendency toward “Same”, explaining the observed below-chance accuracy (~30%) on Hard trials.

3.10.3 Subject Heterogeneity in Drift Rates

The discrepancy between analytical predictions (using group-level mean parameters) and PPC results (using full posterior with subject heterogeneity) suggests substantial individual differences in drift rates. To verify this, we examined the distribution of subject-level drift rate estimates.

Distribution of Subject-Level Drift Rates on Standard Trials. Histogram showing the distribution of subject-specific drift rate estimates (deviations from group mean) for Standard trials. The distribution shows substantial heterogeneity (SD = 0.65, range: -3.08 to -0.21), with most subjects showing strong negative drift (|v| ≥ 1.0) but a small subset (4.5%) showing weak drift (|v| < 0.5). This heterogeneity explains why the analytical formula (using group means) under-predicts error rates compared to PPC (which respects individual differences), consistent with Jensen’s Inequality in hierarchical models.

Result: Subject-level drift rates show substantial heterogeneity (SD = 0.65, range: -3.08 to -0.21). Most subjects (60%) show strong negative drift (|v| ≥ 1.0), while a small subset (4.5%) show weak drift (|v| < 0.5). This heterogeneity explains the PPC vs. analytical formula discrepancy: subjects with weaker drift contribute disproportionately to error rates, but their contribution is masked when using group-level mean parameters.

Conclusion: All three sanity checks confirm the internal consistency of our model estimates. RT patterns, drift directions, and individual differences align with model predictions, providing strong evidence that the hierarchical DDM accurately captures the decision-making processes in our data.

4 Posterior Predictive Checks

4.1 PPC Validation Method

Posterior Predictive Checks (PPC) were performed to validate model fit by comparing observed data to data simulated from the fitted model. To avoid aggregation bias (Jensen’s Inequality) inherent in using group-level mean parameters in non-linear formulas, we used full posterior predictive sampling that respects subject-level random effects (vehtari2020rank?). This approach generates predictions for every trial in the dataset, maintaining the hierarchical structure of the model.

PPC Implementation: For the primary model, we generated 1,000 posterior predictive draws using brms::posterior_predict() with negative_rt = TRUE to obtain signed reaction times (positive RT = “Different”/upper boundary, negative RT = “Same”/lower boundary). This parameter is critical for correctly extracting choice predictions from brms Wiener models. For each draw, we computed the proportion of “Different” responses and compared the distribution of predicted proportions to the observed proportion in the data.

PPC Results: On Standard trials, the model accurately predicted choice proportions: observed 10.9% “Different” responses vs. predicted 11.2% (95% credible interval: [9.9%, 12.7%]). The difference of 0.3% falls well within acceptable ranges, confirming that the model captures the data distribution accurately. The observed value falls within the 95% credible interval, indicating excellent model fit.

Posterior Predictive Check: Choice Proportions on Standard Trials. Histogram showing the distribution of predicted proportions of ‘Different’ responses across 500 posterior predictive draws. The red dashed vertical line indicates the observed proportion (10.9%), and the blue solid line indicates the mean predicted proportion (11.2%). The orange dotted lines show the 95% credible interval ([9.9%, 12.7%]). The observed value falls well within the predicted distribution, confirming that the hierarchical model accurately captures the data structure and avoids aggregation bias (Jensen’s Inequality) by respecting subject-level heterogeneity.

4.2 Primary PPC Gate: Subject-Wise Mid-Body Quantiles

Our primary gate for model acceptance is the subject-wise mid-body PPC (conditional on response, 2% censored). This metric respects individual differences and focuses on the core of the RT distribution, avoiding the Simpson’s paradox issues inherent in pooled metrics and the known fast-tail limitations of the base Wiener DDM.

Thresholds (pre-declared):

  • QP RMSE fail > 0.12 s (warn > 0.09 s)
  • KS statistic fail > 0.20 (warn > 0.15)
  • Target: ≤ 15% of cells flagged
Subject-Wise Mid-Body PPC (30/50/70% quantiles; censored 2%)
task effort_condition difficulty_level n qp_rmse ks_mean qp_rmse_midbody emp_accuracy qp_flag ks_flag midbody_flag any_flag
ADT Low_5_MVC Standard 881 0.281 0.314 0.186 0.824 TRUE TRUE TRUE TRUE
ADT Low_5_MVC Hard 1776 0.354 0.290 0.234 0.312 TRUE TRUE TRUE TRUE
ADT Low_5_MVC Easy 1777 0.254 0.270 0.178 0.806 TRUE TRUE TRUE TRUE
ADT High_MVC Standard 841 0.250 0.290 0.166 0.860 TRUE TRUE TRUE TRUE
ADT High_MVC Hard 1673 0.349 0.278 0.230 0.278 TRUE TRUE TRUE TRUE
ADT High_MVC Easy 1687 0.276 0.288 0.177 0.795 TRUE TRUE TRUE TRUE
VDT Low_5_MVC Standard 882 0.257 0.318 0.181 0.917 TRUE TRUE TRUE TRUE
VDT Low_5_MVC Hard 1751 0.356 0.290 0.199 0.331 TRUE TRUE TRUE TRUE
VDT Low_5_MVC Easy 1698 0.230 0.287 0.155 0.899 TRUE TRUE TRUE TRUE
VDT High_MVC Standard 868 0.241 0.302 0.162 0.910 TRUE TRUE TRUE TRUE
VDT High_MVC Hard 1732 0.342 0.275 0.201 0.297 TRUE TRUE TRUE TRUE
VDT High_MVC Easy 1677 0.228 0.286 0.143 0.909 TRUE TRUE TRUE TRUE
Subject-Wise PPC Summary
Metric Value
N Cells 12
N Flagged 12
% Flagged 100.0%

Result: 100.0% of cells flagged. The subject-wise mid-body PPC gate (based on strict pooled quantiles) was not met due to fast-tail deviations. However, as detailed below, the joint model cell-wise PPCs show that the model captures the central tendencies for the majority of conditions (Standard/Easy), with misfit primarily concentrated in VDT-Hard.

4.3 Visual Diagnostics

4.3.1 1. RT Distribution Overlays

Posterior Predictive Check: RT Distributions. Empirical (black solid) vs. posterior predictive (blue solid) RT densities by Task × Effort × Difficulty. Overall model fit is good for central tendencies, with some misfit in fast tails (especially Easy/VDT).

4.3.2 Quantile-Probability (QP) Plots

Quantile-Probability (QP) Plot. Empirical vs. predicted RT quantiles by difficulty level, with separate panels for Task × Effort. Points colored by difficulty (Standard=gray, Easy=blue, Hard=red) and shaped by response type (Correct/Error). Dashed diagonal = perfect prediction. Deviations primarily occur in fast tails for Easy/VDT conditions.

4.3.3 Sensitivity Analyses

We conducted additional sensitivity analyses (Unconditional Pooled PPC, Conditional Pooled PPC) which confirmed that the core findings are robust, though strict pooled metrics flag more cells due to fast-tail misfit. These additional checks are detailed in the Supplementary Figures.

5 Discussion

5.1 Summary of Key Findings

Our hierarchical drift-diffusion model revealed that high physical effort (40% MVC) significantly reduced drift rates (\(v\)) and slowed non-decision time (\(t_0\)), but did not increase boundary separation (\(a\)). This pattern of results confirms the “detrimental impact” hypothesis grounded in Resource Competition theory (Azer et al., 2023; Wickens, 2008), which predicts that concurrent physical effort consumes shared cognitive resources, degrading the quality of evidence accumulation. However, the null effect on boundary separation challenges the “adaptive caution” hypothesis (Strategic Adaptation), which predicted that older adults would respond to increased internal noise by raising their decision thresholds to preserve accuracy.

5.2 The “Crunch Point”: Why Drift Rate Declined

Our finding that high physical effort significantly reduced drift rates—indicating slower and noisier information accumulation—challenges simple arousal-facilitation accounts and aligns more closely with resource-depletion models. Specifically, the observed decline in processing efficiency supports the Compensation-Related Utilization of Neural Circuits Hypothesis (CRUNCH) (Reuter-Lorenz & Cappell, 2008). The CRUNCH model posits that while older adults can effectively recruit compensatory neural resources to meet lower task demands, they hit a resource ceiling or “crunch point” at lower levels of objective difficulty than younger adults. Once this threshold is crossed, compensatory mechanisms fail, and performance declines precipitously. In the present study, the 40% MVC condition likely pushed participants beyond this critical tipping point. Rather than acting as a beneficial arousal boost that “sharpens” neural gain (as predicted by Adaptive Gain Theory for moderate levels; (Aston-Jones & Cohen, 2005)), the sustained high-effort requirement consumed the limited cognitive resources available for evidence accumulation, resulting in the observed degradation of drift rate.

5.3 The “Dual-Task Cost”: Why Non-Decision Time Slowed

Contrary to the expectation that arousal-induced motor facilitation might speed up response execution, we observed a slowing of non-decision time (\(t_0\)) under high effort. This result is best understood through the framework of Cognitive-Motor Interference (CMI) (Seidler et al., 2010; Woollacott & Shumway-Cook, 2002). In healthy aging, motor control processes—such as maintaining a precise isometric grip—become less automatic and increasingly reliant on executive attentional resources, a phenomenon known as dedifferentiation (Seidler et al., 2010). Consequently, the “physical” task of gripping competes directly with the “cognitive” task of motor planning and response selection. In our dual-task paradigm, the attentional demand required to maintain the 40% MVC force likely drew upon the same shared resource pool needed to initiate the button press, creating a bottleneck that manifested as a prolongation of the non-decision component (\(t_0\)). This suggests that for older adults, concurrent physical exertion acts less as a passive background state and more as an active dual-task stressor that interferes with the efficiency of the motor loop.

5.4 Strategic Rigidity: Why Caution (\(a\)) Didn’t Increase

Despite the internal noise introduced by high effort (as evidenced by reduced drift rates), older adults failed to dynamically adjust their decision criteria by increasing boundary separation (\(a\)). This null finding suggests strategic rigidity in aging: older adults may have difficulty flexibly modulating their decision thresholds in response to changing task demands, even when such adaptation would be beneficial. This rigidity could reflect reduced executive flexibility or a tendency to maintain a fixed “safety-first” strategy regardless of context. While older adults are generally risk-averse and prioritize accuracy (Starns & Ratcliff, 2010), the failure to increase caution under conditions of degraded evidence quality may indicate that the cognitive resources needed for strategic adjustment are themselves depleted by the dual-task demands of the high-effort condition.

5.5 Bias and Phasic Arousal

Regarding starting-point bias (\(z\)), our results revealed a consistent conservative bias across conditions, with a posterior mean of \(z = 0.567\) (95% CrI [0.534, 0.601]) on Standard trials, indicating a slight preference for “different” responses. This bias was robust across effort conditions, with no significant effect of effort level (High vs. Low contrast: Δ = 0.048, 95% CrI [-0.025, 0.120], P(Δ>0) = 0.903). However, we did observe a significant task difference: VDT showed less bias toward “different” than ADT (Δ = -0.179, 95% CrI [-0.259, -0.101], P(Δ>0) < 0.001), suggesting modality-specific differences in response tendencies.

These findings can be interpreted in the context of LC-NE system dynamics. Recent work suggests that phasic arousal, indexed by pupil dilation, can suppress pre-existing choice biases, “resetting” the decision process to a neutral state (Gee et al., 2020). In our study, the lack of a significant effort effect on bias suggests that the high-effort manipulation (40% MVC) may not have elicited strong enough phasic arousal responses to modulate starting-point bias, or that any such effects were offset by other factors. Alternatively, the integrity of the LC-NE system in our older adult sample may have moderated the expected bias suppression (Huang & Clewett, 2024). The task-specific bias differences (VDT < ADT) may reflect inherent differences in how auditory versus visual detection tasks engage response strategies, independent of arousal state. Future work integrating direct pupillometry measures will be needed to test whether effort-induced phasic arousal responses are indeed present but insufficient to shift bias, or whether the LC-NE system’s responsiveness to physical effort differs from its responsiveness to cognitive challenge in older adults.

5.6 Limitations & Conclusion

Our findings must be interpreted within the constraints of the response-signal design, where RTs are measured from response-screen onset rather than stimulus onset. This design constrains the interpretation of \(t_0\) to primarily reflect motor execution and response selection, excluding early perceptual/encoding processes (see Limitations section for detailed discussion). Despite these constraints, our results provide clear evidence that effort regulation is critical for older adults because they have a lower “tipping point” where effort becomes interference. The CRUNCH model and CMI framework together explain why the 40% MVC condition pushed older adults past their compensatory capacity, resulting in degraded processing efficiency (reduced drift) and slowed motor execution (increased \(t_0\)), without the adaptive increase in caution that might have mitigated these effects. These findings underscore the importance of managing effort levels in real-world contexts where older adults must balance physical and cognitive demands.

6 Data Availability & Funding

6.1 Sample Size & Precision

With N=67 subjects and ~266 trials per subject (17,834 total), hierarchical estimation provides adequate precision for group-level and subject-level effects. Effective sample sizes (ESS) for all parameters exceeded 400, indicating stable posterior estimates.

6.2 Data & Code Availability

All analysis code and de-identified data are available in the project repository:
Repository: modeling-pupil-DDM
Analysis scripts: R/, scripts/
Report source: reports/chap3_ddm_results.qmd

Note: The behavioral dataset and detailed task methodology are described in the LC behavioral report manuscript (see References). This DDM analysis uses the same dataset and participants.

6.3 Funding

This research was supported by the National Institutes of Health (Project ID: 11096010). Additional grant details can be found at: https://reporter.nih.gov/search/l8qkCFX0Cki47b9kZOa3Pg/project-details/11096010. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

7 Limitations & Future Directions

7.1 Model Family Limitations

  1. Constant-drift Wiener DDM: The base Wiener DDM assumes constant drift within each trial and no across-trial variability in drift (sv), starting point (sz), or non-decision time (st₀). This can underfit fast tails, especially in VDT-Hard conditions. The constant-drift Wiener DDM underfits fast RT tails, especially in VDT-Hard. Response-signal timing limits identifiability of across-trial variability. Future work could add a small contaminant mixture, across-trial variability (sv, sz), or urgency/collapsing bounds; LBA/race models may better capture fast-tail dynamics in the Easy/VDT regime.

  2. Non-decision time (t₀) random effects omitted: In the response-signal design, t₀ primarily reflects motor execution. We modeled t₀ with group-level intercepts and small task/effort effects but omitted subject-level random effects due to identifiability concerns and initialization failures in pilot models. This may underestimate individual differences in motor execution speed.

  3. Alternative model families: Linear Ballistic Accumulator (LBA) or race models may provide better fit for fast-tail dynamics, particularly for Easy/VDT. These models allow for more flexible RT distributions and may better accommodate the response-signal design.

7.2 Design-Specific Limitations

  1. Response-signal RT measurement: RTs are measured from response-screen onset, not stimulus onset. This constrains the interpretation of t₀ to motor execution and response selection, excluding early perceptual/encoding processes. While this is appropriate for the current design, it limits generalizability to traditional RT paradigms.

  2. Effort manipulation: Physical effort (grip force) may interact with motor execution in complex ways not fully captured by small fixed effects on t₀. Future work integrating EMG or kinematic measures could provide richer insights into effort-motor interactions.

7.3 Misfit in Easy/VDT

  1. Fast-tail misfit: The most pronounced misfit occurs in Easy/VDT conditions, where the model underpredicts the frequency of very fast correct responses. This suggests a subset of trials may reflect:

    • Anticipatory responses (partially captured by 2% censoring)
    • A “fast-guess” process not represented in the base DDM
    • Extremely high drift rates that are incompatible with the assumed Wiener process for a small subset of trials

    Sensitivity analyses (2% censoring, unconditional PPCs) confirm that substantive conclusions are robust, but future work should explore mixture models or urgency signals to better account for these fast responses.

8 Conclusions

This chapter presents a comprehensive hierarchical Wiener DDM analysis of a response-signal change-detection task in older adults. The primary model, in which task difficulty modulates drift rate, boundary separation, and starting-point bias, is strongly supported by LOO cross-validation and shows acceptable fit to subject-wise mid-body RT quantiles. Key findings—difficulty effects on v, a, and z; task-specific processing differences; and small effort effects—are robust across multiple sensitivity analyses. While the base Wiener DDM shows localized misfit in fast tails (especially Easy/VDT), this does not undermine the core substantive conclusions. Future extensions incorporating across-trial variability, urgency, or mixture models may further improve absolute fit.

9 Supplementary Figures

9.1 S1. Conditional Accuracy Function (CAF)

Conditional Accuracy Function (CAF). Empirical accuracy by RT bin for each Task × Effort × Difficulty combination. Shows the speed–accuracy tradeoff: faster responses (lower bins) tend toward chance accuracy, while slower responses show higher accuracy, consistent with evidence accumulation over time.

9.2 S2. PPC Residual Heatmaps

PPC Residual Heatmaps. KS statistic and QP RMSE by Task × Effort × Difficulty for all models (top panel) and primary model only (bottom panel). Darker red indicates larger residuals (poorer fit). The primary model shows acceptable fit across most cells, with notable misfit in Easy/VDT conditions.

9.2.1 Heatmap Detail Tables

PPC Residual Heatmap (Wide Format)
Task Effort Difficulty KS Statistic QP RMSE
ADT Low_5_MVC Standard 0.109 0.208
ADT Low_5_MVC Hard 0.126 0.147
ADT Low_5_MVC Easy 0.191 0.367
ADT High_MVC Standard 0.173 0.165
ADT High_MVC Hard 0.104 0.120
ADT High_MVC Easy 0.185 0.349
VDT Low_5_MVC Standard 0.144 0.303
VDT Low_5_MVC Hard 0.122 0.256
VDT Low_5_MVC Easy 0.265 0.469
VDT High_MVC Standard 0.221 0.300
VDT High_MVC Hard 0.101 0.234
VDT High_MVC Easy 0.241 0.445

9.3 S3. Unconditional Pooled PPC Metrics (Reference)

This table reports metrics from the strict unconditional pooled test (censored 2%), provided for completeness. As noted in the text, this pooled test is overly sensitive to small deviations in fast tails and is superseded by the subject-wise gate (≤15% flagged) and the joint model cell-wise PPCs (Standard/Easy good, VDT-Hard modest misfit).

Pooled PPC Gate Summary (Strict Test)
N Cells % Flagged Max QP RMSE Max KS
12 100 0.469 0.265

9.4 S4. Sensitivity Analysis: Exclusion of Sub-Chance Participants

To verify that the inclusion of 12 participants who performed at or below chance (≤55% accuracy) in some conditions did not bias our main findings, we conducted sensitivity analyses comparing the primary model (N=67) with models fit after excluding these participants (N=55).

Method: We refit the primary model (Model3_Difficulty) and an additive model (Model4_Additive) on the reduced dataset excluding sub-chance participants. Parameter estimates were compared using delta (sensitivity - baseline) with conservative 95% credible intervals. If the delta CI includes zero and the baseline and sensitivity CIs overlap, we conclude the parameter is robust to exclusion.

Sensitivity Analysis: Excluding Sub-Chance Participants
Model Parameter Baseline Excluded Δ Δ CI Lower Δ CI Upper CI Overlap Δ Contains 01
Additive (v + a + z) Boundary (a): Intercept 0.704 0.715 0.010 -0.082 0.101 TRUE TRUE
Additive (v + a + z) Drift (v): Difficulty: Hard -1.534 -1.635 -0.101 -0.184 -0.019 FALSE FALSE
Additive (v + a + z) Drift (v): Difficulty: Standard -0.182 -0.287 -0.106 -0.200 -0.009 FALSE FALSE
Additive (v + a + z) Drift (v): Effort: Low_5_MVC 0.042 0.046 0.004 -0.063 0.073 TRUE TRUE
Additive (v + a + z) Drift (v): Intercept 0.993 1.134 0.142 -0.034 0.328 TRUE TRUE
Difficulty → v (drift) Boundary (a): Intercept 0.704 0.717 0.012 -0.081 0.106 TRUE TRUE
Difficulty → v (drift) Drift (v): Difficulty: Hard -1.533 -1.635 -0.101 -0.183 -0.020 FALSE FALSE
Difficulty → v (drift) Drift (v): Difficulty: Standard -0.182 -0.287 -0.105 -0.202 -0.008 FALSE FALSE
Difficulty → v (drift) Drift (v): Intercept 1.016 1.156 0.141 -0.037 0.322 TRUE TRUE
1 Baseline: N=67 (includes sub-chance). Excluded: N=55 (excludes sub-chance). Δ = Excluded - Baseline. If Δ CI contains 0 and CIs overlap, parameter is robust.

Results: Most key parameters showed robust estimates when excluding sub-chance participants. For Model3_Difficulty and Model4_Additive, the drift intercept and boundary separation showed delta CIs that included zero, indicating no meaningful change. The Easy difficulty effect was also robust. The Hard difficulty effect showed a small shift (Δ ≈ -0.10, delta CI did not include zero), but this represents a small change in magnitude (~6.5% of the baseline estimate) and does not alter the substantive conclusion that Hard trials show negative drift relative to Standard. Conclusion: The inclusion of sub-chance participants did not meaningfully alter main effects or substantive conclusions, supporting our decision to retain all 67 participants to maximize sample size and leverage hierarchical modeling’s ability to stabilize estimates through shrinkage.

References

Allaire, J., Xie, Y., Dervieux, C., McPherson, J., Luraschi, J., Ushey, K., Atkins, A., Wickham, H., Cheng, J., Chang, W., & Iannone, R. (2022). Quarto. https://quarto.org
Aston-Jones, G., & Cohen, J. D. (2005). An integrative theory of locus coeruleus–norepinephrine function: Adaptive gain and optimal performance. Annual Review of Neuroscience, 28, 403–450. https://doi.org/10.1146/annurev.neuro.28.061604.135709
Azer, L., Xie, W., Park, H.-B., & Zhang, W. (2023). Detrimental effects of effortful physical exertion on a working memory dual-task in older adults. Psychology and Aging, 38(4), 291–304. https://doi.org/10.1037/pag0000730
Beatty, J. (1982). Task-evoked pupillary responses, processing load, and the structure of processing resources. Psychological Bulletin, 91(2), 276–292. https://doi.org/10.1037/0033-2909.91.2.276
Birren, J. E. (1965). Age changes in speed of behavior: Its central nature and physiological correlates. Behavior, Aging, and the Nervous System, 191–216.
Brinley, J. F. (1965). Cognitive sets, speed and accuracy of performance in the elderly. Behavior, Aging and the Nervous System, 114, 49.
Bürkner, P.-C. (2017). Brms: An R package for bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1–28. https://doi.org/10.18637/jss.v080.i01
Bürkner, P.-C. (2018). Advanced bayesian multilevel modeling with the R package brms. The R Journal, 10(1), 395–411. https://doi.org/10.32614/RJ-2018-017
Bürkner, P.-C., Gabry, J., Kay, M., & Vehtari, A. (2022). Posterior: Tools for working with posterior distributions. Journal of Statistical Software, 102(10), 1–37. https://doi.org/10.18637/jss.v102.i10
Cavanagh, J. F., Wiecki, T. V., Kochar, A., & Frank, M. J. (2014). Eye tracking and pupillometry are indicators of dissociable latent decision processes. Journal of Experimental Psychology: General, 143(4), 1476–1488. https://doi.org/10.1037/a0035813
Cerella, J. (1985). Information processing rates in the elderly. Psychological Bulletin, 98(1), 67–83. https://doi.org/10.1037/0033-2909.98.1.67
Eldar, E., Cohen, J. D., & Niv, Y. (2013). The effects of neural gain on attention and learning. Nature Neuroscience, 16(8), 1146–1153. https://doi.org/10.1038/nn.3428
Feller, W. (1968). An introduction to probability theory and its applications (3rd ed., Vol. 1). John Wiley & Sons.
Frank, M. J. (2006). Hold your horses: A dynamic computational role for the subthalamic nucleus in decision making. Neural Networks, 19(8), 1120–1136. https://doi.org/10.1016/j.neunet.2006.03.006
Gabry, J., & Češnovar, R. (2021). Cmdstanr: R interface to ’CmdStan’. https://doi.org/10.32614/RJ-2021-123
Gee, J. W. de, Tsetsos, K., Schwabe, L., Urai, A. E., & Donner, T. H. (2020). Pupil-linked phasic arousal predicts a reduction of choice bias across species and decision domains. eLife, 9, e54014. https://doi.org/10.7554/eLife.54014
Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457–472. https://doi.org/10.1214/ss/1177011136
Gilzenrat, M. S., Nieuwenhuis, S., Jepma, M., & Cohen, J. D. (2010). Pupil diameter tracks changes in control state predicted by the adaptive gain theory of locus coeruleus function. Cognitive, Affective, & Behavioral Neuroscience, 10(2), 252–269. https://doi.org/10.3758/CABN.10.2.252
Hess, T. M., & Ennis, G. (2012). Age differences in the effort and costs associated with cognitive activity. Journal of Gerontology: Psychological Sciences, 67B(4), 447–455. https://doi.org/10.1093/geronb/gbr129
Huang, R., & Clewett, D. (2024). The locus coeruleus: Where cognitive and emotional processing meet the eye. In Modern pupillometry (pp. 3–75). Springer International Publishing.
Iannone, R., Cheng, J., Schloerke, B., Hughes, E., Lauer, A., & Seo, J. (2024). Gt: Easily create presentation-ready display tables. https://CRAN.R-project.org/package=gt
Joshi, S., Li, Y., Kalwani, R. M., & Gold, J. I. (2016). Relationships between pupil diameter and neuronal activity in the locus coeruleus, colliculi, and cingulate cortex. Neuron, 89(1), 221–234. https://doi.org/10.1016/j.neuron.2015.11.028
Kahneman, D., & Beatty, J. (1966). Pupil diameter and load on memory. Science, 154(3756), 1583–1585. https://doi.org/10.1126/science.154.3756.1583
Lee, T.-H., Greening, S. G., Ueno, T., Clewett, D., Ponzio, A., Sakaki, M., & Mather, M. (2018). Arousal increases neural gain via the locus coeruleus-noradrenaline system in younger adults but not in older adults. Nature Human Behaviour, 2(5), 356–366. https://doi.org/10.1038/s41562-018-0344-1
Li, S.-C., Lindenberger, U., & Sikström, S. (2001). Aging cognition: From neuromodulation to representation. Trends in Cognitive Sciences, 5(11), 479–486.
Lindenberger, U., & Baltes, P. B. (1994). Sensory functioning and intelligence in old age: A strong connection. Psychology and Aging, 9(3), 339.
Madden, D. J., & Allen, P. A. (1991). Adult age differences in the rate of information extraction during visual search. Journal of Gerontology, 46(3), P124–P126. https://doi.org/10.1093/geronj/46.3.P124
Mata, R., Schooler, L. J., & Rieskamp, J. (2007). The aging decision maker: Cognitive aging and the adaptive selection of decision strategies. Psychology and Aging, 22(4), 796.
Mather, M., Clewett, D., Sakaki, M., & Harley, C. W. (2016). Norepinephrine ignites local hotspots of neuronal excitation: How arousal amplifies selectivity in perception and memory. Behavioral and Brain Sciences, 39, e200. https://doi.org/10.1017/S0140525X15000667
Mather, M., & Harley, C. W. (2016). The locus coeruleus: Essential for maintaining cognitive function and the aging brain. Trends in Cognitive Sciences, 20(3), 214–226. https://doi.org/10.1016/j.tics.2016.01.001
Mikneviciute, G., Ballhausen, N., Rimmele, U., & Kliegel, M. (2022). Does older adults’ cognition particularly suffer from stress? A systematic review of acute stress effects on cognition in older age. Neuroscience & Biobehavioral Reviews, 132, 583–602. https://doi.org/10.1016/j.neubiorev.2021.12.010
Murphy, P. R., O’Connell, R. G., O’Sullivan, R., Robertson, I. H., & Balsters, J. H. (2014). Pupil diameter covaries with BOLD activity in human locus coeruleus. Human Brain Mapping, 35(8), 4140–4154. https://doi.org/10.1002/hbm.22466
R Core Team. (2025). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
Rabbitt, P. (1979). How old and young subjects monitor and control responses for accuracy and speed. British Journal of Psychology, 70(2), 305–311.
Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85(2), 59–108. https://doi.org/10.1037/0033-295X.85.2.59
Ratcliff, R., & McKoon, G. (2008). The diffusion decision model: Theory and data for two-choice decision tasks. Neural Computation, 20(4), 873–922. https://doi.org/10.1162/neco.2008.12-06-420
Ratcliff, R., Thapar, A., & McKoon, G. (2001). The effects of aging on reaction time in a signal detection task. Psychology and Aging, 16(2), 323–341. https://doi.org/10.1037/0882-7974.16.2.323
Ratcliff, R., Thapar, A., & Mckoon, G. (2003). A diffusion model analysis of the effects of aging on brightness discrimination. Perception & Psychophysics, 65(4), 523–535.
Ratcliff, R., Thapar, A., & McKoon, G. (2004). A diffusion model analysis of the effects of aging in the lexical-decision task. Psychology and Aging, 19(2), 278–289. https://doi.org/10.1037/0882-7974.19.2.278
Ratcliff, R., Thapar, A., & McKoon, G. (2006). Aging and individual differences in rapid two-choice decisions. Psychonomic Bulletin & Review, 13(4), 626–635.
Ratcliff, R., Thapar, A., & McKoon, G. (2007). Application of the diffusion model to two-choice tasks for adults 75-90 years old. Psychology and Aging, 22(1), 56.
Ratcliff, R., & Tuerlinckx, F. (2002). Estimating parameters of the diffusion model: Approaches to dealing with contaminant reaction times and parameter variability. Psychonomic Bulletin & Review, 9(3), 438–481. https://doi.org/10.3758/BF03196302
Reuter-Lorenz, P. A., & Cappell, K. A. (2008). Neurocognitive aging and the compensation hypothesis. Current Directions in Psychological Science, 17(3), 177–182. https://doi.org/10.1111/j.1467-8721.2008.00570.x
Salthouse, T. A. (1985). Speed of behavior and its implications for cognition.
Salthouse, T. A. (1996). The processing-speed theory of adult age differences in cognition. Psychological Review, 103(3), 403–428. https://doi.org/10.1037/0033-295X.103.3.403
Seidler, R. D., Bernard, J. A., Burutolu, T. B., Fling, B. W., Gordon, M. T., Gwin, J. T., Kwak, Y., & Lipps, D. B. (2010). Motor control and aging: Links to age-related brain structural, functional, and biochemical effects. Neuroscience & Biobehavioral Reviews, 34(5), 721–733.
Spaniol, J., Madden, D. J., & Voss, A. (2006). A diffusion model analysis of adult age differences in episodic and semantic long-term memory retrieval. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32(1), 101–117. https://doi.org/10.1037/0278-7393.32.1.101
Stan Development Team. (2024). CmdStan user’s guide (Version 2.34). https://mc-stan.org/docs/cmdstan-guide/
Starns, J. J., & Ratcliff, R. (2010). The effects of aging on the speed–accuracy compromise: Boundary optimality in the diffusion model. Psychology and Aging, 25(2), 377–390. https://doi.org/10.1037/a0018022
Urai, A. E., Braun, A., & Donner, T. H. (2017). Pupil-linked arousal is driven by decision uncertainty and alters serial choice bias. Nature Communications, 8(1), 14637. https://doi.org/10.1038/ncomms14637
Van Gerven, P. W., Paas, F., Van Merriënboer, J. J., & Schmidt, H. G. (2004). Memory load and the cognitive pupillary response in aging. Psychophysiology, 41(2), 167–174. https://doi.org/10.1111/j.1469-8986.2003.00148.x
Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27(5), 1413–1432. https://doi.org/10.1007/s11222-016-9696-4
Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P.-C. (2021). Rank-normalization, folding, and localization: An improved for assessing convergence of MCMC (with discussion). Bayesian Analysis, 16(2), 667–718. https://doi.org/10.1214/20-BA1221
Verhaeghen, P., Steitz, D. W., Sliwinski, M. J., & Cerella, J. (2003). Aging and dual-task performance: A meta-analysis. Psychology and Aging, 18(3), 443–460. https://doi.org/10.1037/0882-7974.18.3.443
Voss, A., Rothermund, K., & Voss, J. (2004). Interpreting the parameters of the diffusion model: An empirical validation. Memory & Cognition, 32(7), 1206–1220.
Wel, P. van der, & Steenbergen, H. van. (2018). Pupil dilation as an index of effort in cognitive control tasks: A review. Psychonomic Bulletin & Review, 25(6), 2005–2015. https://doi.org/10.3758/s13423-018-1432-y
Whelan, R. (2008). Effective analysis of reaction time data. The Psychological Record, 58(3), 475–482. https://doi.org/10.1007/BF03395630
Wickens, C. D. (2008). Multiple resources and mental workload. Human Factors, 50(3), 449–455. https://doi.org/10.1518/001872008X288394
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org
Wickham, H., François, R., Henry, L., Müller, K., & Vaughan, D. (2023). Dplyr: A grammar of data manipulation. https://CRAN.R-project.org/package=dplyr
Wickham, H., Hester, J., François, R., & Bryan, J. (2024). Readr: Read rectangular text data. https://CRAN.R-project.org/package=readr
Wickham, H., Vaughan, D., & Girlich, M. (2023). Tidyr: Tidy messy data. https://CRAN.R-project.org/package=tidyr
Wiecki, T. V., Sofer, I., & Frank, M. J. (2013). HDDM: Hierarchical bayesian estimation of the drift-diffusion model in python. Frontiers in Neuroinformatics, 7, 14. https://doi.org/10.3389/fninf.2013.00014
Woods, D. L., Wyma, J. M., Yund, E. W., Herron, T. J., & Reed, B. (2015). Age-related slowing of response selection and production in a visual choice reaction time task. Frontiers in Human Neuroscience, 9, 193. https://doi.org/10.3389/fnhum.2015.00193
Woollacott, M., & Shumway-Cook, A. (2002). Attention and the control of posture and gait: A review of an emerging area of research. Gait & Posture, 16(1), 1–14. https://doi.org/10.1016/S0966-6362(01)00156-4
Yerkes, R. M., & Dodson, J. D. (1908). The relation of strength of stimulus to rapidity of habit-formation. Journal of Comparative Neurology and Psychology, 18(5), 459–482. https://doi.org/10.1002/cne.920180503

Note: The LC behavioral report manuscript (in preparation/published) describes the behavioral dataset and methodology used in this analysis. Full citation details will be added when available.